Learning Juntas in the Presence of Noise
نویسندگان
چکیده
We investigate the combination of two major challenges in computational learning: dealing with huge amounts of irrelevant information and learning from noisy data. It is shown that large classes of Boolean concepts that depend only on a small fraction of their variables—so-called juntas—can be learned efficiently from uniformly distributed examples that are corrupted by random attribute and classification noise. We present solutions to cope with the manifold problems that inhibit a straightforward generalization of the noise-free case. Additionally, we extend our methods to non-uniformly distributed examples and derive new results for monotone juntas and for parity juntas in this setting. It is assumed that the attribute noise is generated by a product distribution. Without any restrictions of the attribute noise distribution, learning in the presence of noise is in general impossible. This follows from our construction of a noise distribution P and a concept class C such that it is impossible to learn C under P -noise.
منابع مشابه
Learning concepts with few unknown relevant attributes from noisy data
In this thesis, we are concerned with two major challenges in computational learning theory: learning in the presence of large amounts of irrelevant information and learning from noisy data. We model the former issue by assuming that the concepts to be learned depend only on few relevant attributes—such concepts are called juntas. The latter issue is modeled by a random noise process that affec...
متن کاملAn Effective Approach for Robust Metric Learning in the Presence of Label Noise
Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...
متن کاملOn Noise-Tolerant Learning of Sparse Parities and Related Problems
We consider the problem of learning sparse parities in the presence of noise. For learning parities on r out of n variables, we give an algorithm that runs in time poly ( log 1 δ , 1 1−2η ) n( +o(1))r/2 and uses only r log(n/δ)ω(1) (1−2η)2 samples in the random noise setting under the uniform distribution, where η is the noise rate and δ is the confidence parameter. From previously known result...
متن کاملFinding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas with Noise PRELIMINARY VERSION
Given a set of n random d-dimensional boolean vectors with the promise that two of them are ρ-correlated with each other, how quickly can one find the two correlated vectors? We present a surprising and simple algorithm which, for any constant > 0 runs in (expected) time dn 3ω 4 + poly( 1 ρ ) < dn·poly( 1 ρ ), where ω < 2.4 is the exponent of matrix multiplication. This is the first subquadrati...
متن کاملParameterized Learnability of k -Juntas and Related Problems
We study the parameterized complexity of learning k-juntas and some variations of juntas. We show the hardness of learning k-juntas and subclasses of k-juntas in the PAC model by reductions from a W[2]complete problem. On the other hand, as a consequence of a more general result we show that k-juntas are exactly learnable with improper equivalence queries and access to a W[P] oracle. Subject Cl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Theor. Comput. Sci.
دوره 384 شماره
صفحات -
تاریخ انتشار 2005